Diphone synthesis using unit selection
نویسندگان
چکیده
This paper describes an experimental AT&T concatenative synthesis system using unit selection, for which the basic synthesis units are diphones. The synthesizer may use any of the data from a large database of utterances. Since there are in general multiple instances of each concatenative unit, the system performs dynamic unit selection. Selection among candidates is done dynamically at synthesis, in a manner that is based on and extends unit selection implemented in the CHATR synthesis system [1][4]. Selected units may be either phones or diphones, and they can be synthesized by a variety of methods, including PSOLA [5], HNM [11], and simple unit concatenation. The AT&T system, with CHATR unit selection, was implemented within the framework of the Festival Speech Synthesis System [2]. The voice database amounted to approximately one and one-half hours of speech and was constructed from read text taken from three sources. The rst source was a portion of the 1989 Wall Street Journal material from the Penn Treebank Project, so that the most frequent diphones were well represented. Complete diphone converage was assured by the second text, which was designed for diphone databases [12]. A third set of data consisted of recorded prompts for telephone service applications. Subjective formal listening tests were conducted to compare speech quality for several options that exist in the AT&T synthesizer, including synthesis methods and choices of fundamental units. These tests showed that unit selection techniques can be successfully applied to diphone synthesis.
منابع مشابه
Unit Size in Unit Selection Speech Synthesis
In this paper, we address the issue of choice of unit size in unit selection speech synthesis. We discuss the development of a Hindi speech synthesizer and our experiments with different choices of units: syllable, diphone, phone and half phone. Perceptual tests conducted to evaluate the quality of the synthesizers with different unit size indicate that the syllable synthesizer performs better ...
متن کاملUnit size in unit selection speech synthesis
In this paper, we address the issue of choice of unit size in unit selection speech synthesis. We discuss the development of a Hindi speech synthesizer and our experiments with different choices of units: syllable, diphone, phone and half phone. Perceptual tests conducted to evaluate the quality of the synthesizers with different unit size indicate that the syllable synthesizer performs better ...
متن کاملCombining non-uniform unit selection with diphone based synthesis
This paper describes the unit selection algorithm of a speech synthesis system, which selects the k-best paths over units from a relational unit database. The algorithm uses words and diphones as basic unit types. It is part of a customisable textto-speech system designed for generating new prompts using a recorded speech corpus, with the option that the user can interactively optimise the resu...
متن کاملRobust Unit Selection System for Speech Synthesis
There has been much interest for many years in diphone-based concatenative speech synthesis and, recently, a rapidly increasing interest in unit selection based synthesis (as illustrated by the CHATR [2] system). However, the limitations of both types of system are well known. While intelligibility is generally very high for diphone based systems, the resulting signals do not sound completely n...
متن کاملUnit selection for speech synthesi target cos
This paper presents a new approach to unit selection for corpus-based speech synthesis, in which the units are selected according to acoustic criteria. In a learning stage, an acoustic clustering is carried out using context dependent HMM. During synthesis, an acoustic target is generated and segmented in the required diphone sequence. For each diphone to be synthesized, a pre-selection module ...
متن کامل